Weaviate
Querying Overview
One way to specify a Weaviate query in Qarbine is to use a JSON-like structure. Below is an example to retrieve up to 3 movies that have some similarity to “dracula”.
{
"collection": "movies",
"limit": 3,
"nearText": "dracula"
}
Weaviate offers a variety of parameters to control how the comparison is done and what properties are returned. You can query Weaviate using one or a combination of a semantic (i.e. vector) search and a lexical (i.e. scalar) search. The former is for ‘similar’ oriented searches while the latter uses more traditional matching techniques (similar to a SQL WHERE clause).
Below is an example of a movie object stored in Weaviate from one of its sample data set.
{
"metadata": {
"distance": 0.17359280586242676,
"certainty": 0.9132035970687866
},
"properties": {
"worst_rating": 1,
"director": "indar dzhendubaev",
"review_date": "12/20/2018",
"duration": "1H50M",
"url": "https://www.imdb.com/title/tt4057376/",
"title": "on - drakon",
"best_rating": 10,
"genres": "Adventure,Fantasy,Romance",
"actors": "matvey lykov,mariya poezzhaeva,stanislav lyubshin,pyotr romanov",
"keywords": "dragon,dragonslayer,uncharted island,wedding,3d",
"rating_value": 6.9,
"review_body": "Beautiful in every way. ...",
"movie_id": 44226,
"poster_link": "https://m.media-amazon.com/images..",
"description": "On - drakon is a movie….",
"date_published": "12/3/2015",
"rating_count": 4136,
"review_aurthor": "botfish-1"
},
"vector": [ 1,2,....],
"uuid": "c0b5e5dd-70da-4a86-90e4-66a84f2efe7c"
}
Unlike the strictly columnar result rows found in SQL databases, the resulting Weaviate objects are returned as nested JSON objects. In this example there are first level fields of metadata, properties, vector, and uuid. The “properties” field is similar in concept to a SQL row or MongoDB document. It can be a nested value as well. The “uuid” field is the Weaviate object identifier.
Prerequisites
Prior to using Qarbine’s embeddings(...) macro function or the SQL-like query function nearText(...), the Qarbine Administrator must first configure “AI Assistant(s)”. The AI Assistants provide access to various popular Generative AI services and are referenced using an alias. Check with your Qarbine administrator for which ones are available and their proper use. For example, when using dynamic query vector embeddings, the model used by the AI Assistant must be compatible with the one used to generate the original embedding values in the database.
Query Specification Options
Primary Options
The primary specification options are described below.
Field | Description |
---|---|
collection | The Weaviate collection to perform the query upon. |
operation | The value is a string. The default is ‘nearText’. Possible values are 'bm25', 'nearText', 'hybrid', 'fetchObjects', 'overAll', 'nearVector'https://weaviate.io/developers/weaviate/api/graphql/search-operators |
nearText | The value is a string with the similarity phrase. For example “dracula movies”. The nearText argument can be used by query.nearText(), hybrid.nearText(), or generate.nearText(). Use the operation field to specify which interaction is wanted. https://weaviate.io/developers/weaviate/api/graphql/search-operators#neartext |
nearObject | The UUID string from which to consider matches “near”.https://weaviate.io/developers/weaviate/api/graphql/search-operators#nearobject |
nearVector | The value is an array of numbers (AKA the raw vector). It indicates an operation of “nextVector”.https://weaviate.io/developers/weaviate/api/graphql/search-operators#nearvector |
nearImage | The base64 representation of an image for similarity matching.https://weaviate.io/developers/weaviate/search/image#by-the-base64-representation |
filters | The GraphQL filter specification. |
limit | The maximum number of matches to return. For more information see https://weaviate.io/developers/weaviate/search/basics#paginate-with-limit-and-offset |
offset | Valid only with fetchObjects operation. It indicates how many objects of the answer set to skip over as part of the returned answer set. For more information see https://weaviate.io/developers/weaviate/search/basics#paginate-with-limit-and-offset |
returnProperties | Which properties of the object to return in the answer set.https://weaviate.io/developers/weaviate/search/basics#specify-object-properties |
returnMetadata | A list of strings. The fields may include id, creationTimeUnix, lastUpdateTimeUnix, certainty, distance, featureProjection and classification. Further additional properties may be available for each query, depending on the query type as well as enabled Weaviate modules. The generate field can be used to perform a generative search. A generate query will cause corresponding additional result fields to be available, such as singleResult, groupedResult and error.https://weaviate.io/developers/weaviate/search/basics#retrieve-metadata-valueshttps://weaviate.io/developers/weaviate/api/graphql/additional-properties |
sort | The list of sorting rules. Each element has a property and ascending flag. For example, {sorts: [ { property: 'question', ascending: false} ] }. You can sort by metadata values as well. Prefix them with an underscore (i.e., _id).https://weaviate.io/developers/weaviate/api/graphql/additional-operators#sorting |
includeVector | Boolean indicating if the row vectors should be included in the answer set. For Qarbine, the default is false to reduce the answer set size. |
Hybrid “operation” search combines results of a vector search and a keyword (BM25F) search. You can set the weights or the ranking method. For more information see https://weaviate.io/developers/weaviate/search/hybrid.
You can get information from Weaviate to explain its search results by including “explainScore” in the returnMetadata list. For example,
{
operation: 'hybrid',
collection : 'Question',
limit: 2,
nearText : "food",
returnProperties: ['question', 'answer'],
returnMetadata: ['score', 'explainScore']
}
For more information see https://weaviate.io/developers/weaviate/search/hybrid#explain-the-search-results.
Secondary Options
The secondary specification options are described below.
Field | Description |
---|---|
groupedTask | Use with “generate” option. An example using the Jeopardy Q&A data is'Write a summary of each category'.https://weaviate.io/developers/weaviate/search/generative#grouped-task-search |
singlePrompt | Use with “generate” option, but with no groupedTask element above.https://weaviate.io/blog/typescript-client-beta#simplified-methods |
generatePrompt | Use with “generate” option. An example using the Jeopardy Q&A data is : 'Convert this quiz question: {question} and answer: {answer} into a trivia tweet.'.URL URL URL |
distance | The maximum distance for similar objects. For more information seehttps://weaviate.io/developers/weaviate/search/aggregate#aggregate-int-properties |
queryProperties | This arguments allows you to specify the relative value of an object's properties in the keyword search. Higher values increase the property's contribution to the search score.Pass a list of string for boosting certain properties. An example using the Jeopardy Q&A data is ['question^2', 'answer']. For more information see https://weaviate.io/developers/weaviate/search/bm25#use-weights-to-boost-properties |
rerank | The rerank field can be used to reorder the search results. A rerank query will cause corresponding additional score field to be available. For more information seehttps://weaviate.io/developers/weaviate/search/rerank |
targetVector | The name of the vector to use. For more information seehttps://weaviate.io/developers/weaviate/search/generative#named-vectors |
alpha | Use the alpha argument to change how much each search affects the results. An alpha of 1 is a pure vector search. An alpha of 0 is a pure keyword search. For more information seehttps://weaviate.io/developers/weaviate/search/hybrid#balance-keyword-and-vector-search |
fusionType | Ranked Fusion is the default fusion algorithm. To use objects' keyword and vector search scores instead of ranks, use Relative Score Fusion. To use autocut with the hybrid operator, use Relative Score Fusion. For more information seehttps://weaviate.io/developers/weaviate/search/hybrid#change-the-ranking-methodf |
returnReferences | For fetchObjects operation only. A sample JSON argument is [ { linkOn: 'hasCategory', returnProperties: ['title'], } ] . For more information seehttps://weaviate.io/developers/weaviate/search/basics#retrieve-cross-referenced-properties |
Weaviate Filter Overview
Weaviate filters are similar to SQL WHERE clauses in concept, but not at all in syntax! Briefly, the operations are And, Or, Equal, NotEqual, GreaterThan, GreaterThanEqual, LessThan, LessThanEqual, Like, WithinGeoRange, and IsNull. Two additional operations are ContainsAny (*Only for array and text properties) and ContainsAll (*Only for array and text properties). An operator to invert a filter (e.g. Not Like ... ) is not supported in Weaviate.
Below is weaviate’s GraphQL to retrieve movies similar to “dracula” with a rating of at least 8
{
"filters": {
"operator": "GreaterThanEqual",
"target": { "property": "rating_value" },
"value": 8
},
"operation": "nearText",
"nearText": [ "dracula" ]
}
Filtering details can be found at https://weaviate.io/developers/weaviate/search/filters.
Generative Search
Generative search, also known as "Retrieval Augmented Generation" (RAG), is a multi-stage process. First Weaviate performs a query, then it passes the retrieved results and a prompt to a large language model (LLM), to generate a new output. Weaviate must be configured to use a generator module. Query your database to retrieve one or more objects. Use the query results to generate a new result using either:
- singlePrompt or
- groupedTask.
For details, see https://weaviate.io/developers/weaviate/search/generative.
Single Prompt Search
Single prompt search returns a generated response for each object in the query results. Define object properties – using {prop-name} syntax – to interpolate retrieved content in the prompt.
The properties you use in the prompt do not have to be among the properties you retrieve in the query. Below is an example
{
collection: 'Question',
nearText: ['World history'],
singlePrompt: 'Convert this quiz question: {question} and answer: {answer} into a trivia tweet.',
limit: 4,
returnProperties: ['answer']
}
For more details see https://weaviate.io/developers/weaviate/search/generative#single-prompt-search.
Grouped Task Search
Grouped task search returns one response that includes all of the query results. By default grouped task search uses all object properties in the prompt. Below is an example
{
collection: 'Question',
nearText: ['Cute animals'],
groupedTask: 'What do these animals have in common, if anything?',
groupedProperties: ['answer', 'question'],
limit: 4,
returnProperties: ['answer']
}
For more details see
https://weaviate.io/developers/weaviate/search/generative#grouped-task-search.
Qarbine Enhanced Interaction Options
SQL Oriented Filtering
Primary Options
Recall that Weaviate supports semantic (i.e. vector) search and a lexical (i.e. scalar/matching) search. The use of GraphQL can be a bit verbose and cumbersome though. To improve readability and productivity when authoring Weaviate retrievals, Qarbine provides a SQL oriented option. For example, the goal to retrieve up to 3 movies that have some similarity to “dracula” can look like the following using a query specification.
{
"collection": "movies",
"limit": 3,
"nearText": "dracula"
}
The Qarbine SQL equivalent is simply
Qarbine’s Weaviate integration goes much further though and extends to the filtering features as well. Weaviate still requires a GraphQL specification but Qarbine is your co-pilot translating SQL-oriented queries into their Weaviate GraphQL equivalents. Adjusting our movie retrieval specification above and adding criteria for the movies to have a rating of at least 8 would be
select * from movies where nearText("dracula") and rating_value >= 8 limit 3
The GraphQL equivalent is much more verbose and quite cumbersome to define. Qarbine allows you to avoid this frustration in many cases. You can always use the JSON structure though at any time.
In some cases the Qarbine Data Source will have literally just the SQL statement above and nothing more. There are techniques to blend the ease of using SQL along with the powerful features of Weaviate within a Qarbine JSON specification object. The table below lists the fields that drive this definition.
JSON Field | Description |
---|---|
sql | The SQL statement can affect all of the primary options listed above. |
sqlWhere | The string can affect all of the primary options listed above except for returnProperties, returnMetadata, includeVector, and collection. |
sortBySql | The ORDER BY clause specifying how to sort the answer set. |
Here is a simple example of combining the SQL and query specification approaches. The effective result is the same as the example query specification above.
{
"sql": "select * from movies limit 3",
"nearText": "dracula"
}
Note that a SQL numeric list is enclosed in parentheses while one in the specification is enclosed in brackets. That is a subtle nuance across the SQL and JSON syntax standards.
The mapping of the standard SQL clauses to their Weaviate equivalents is described below.
Clause | Description |
---|---|
SELECT | The names of the fields to return. Specifying “*” indicates all object fields . This does not set the returnProperties field in which case Weaviate returns all of the properties. You can also reference metadata properties by prefixing them with “metadata_”. This list of strings is passed as the returnMetadata value. Here are some examples.SELECT * …SELECT title, rating_value …SELECT *,metadata_score …SELECT title, rating_value, metadata_score …Including “vector” in the SELECT list sets the includeVector field of the query specification.Including ‘!uuid’ removes the uuid field from the answer set rows. |
FROM | The name of the Weaviate collection. This value sets the “collection” field in the query specification. |
WHERE | See the discussion below. The effect is to set the “filters” field of the query specification. |
ORDER BY | The sorting rules in “column Asc|desc” format. This sets the “sort” field of the query specification. |
OFFSET | Indicates where in the the return objects start return objects. This sets the “offset” field of the query specification. |
LIMIT | Indicates at most how many elements to return. This sets the “limit” field of the query specification. |
Bear in mind that some combinations of query fields may not make sense in the Weaviate world. The WHERE clause criteria can be in a variety of traditional SQL forms and may include Qarbine specific functions described below. For example,
select * from Movies where nearText("dracula")
results in a query specification with these fields,
collection: "Movies",
nearText: "dracula"
There is no filter value in the query specification. Some additional Qarbine defined SQL function are listed below.
nearImage(base64ImageData)
nearObject(aUUID)
nearText(aPhrase)
bm25NearText(aPhrase)
hybridNearText(aPhrase)
nearVector(number1, number n …)
vector = (number 1, number n ...) ← A different way of expressing nearVector()
nearNamedVector(useVector, number1, number n …)
Function | Description |
---|---|
nearImage | This clause is removed from the WHERE criteria and its base64 argument set into “nearImage” field of the query specification. |
nearObject | This clause is removed from the WHERE criteria and its UUID argument set into “nearObject” field of the query specification. |
nearVector | This clause is removed from the WHERE criteria and its list of numbers argument set into “nearVector” field of the query specification. |
nearNamedVector | Use this when a collection has multiple vector indices and you want to use a specific vector. |
nearText | This clause is removed from the WHERE criteria and its argument set into “nearText” field of the query specification. The nearText argument can be used by query.nearText(), hybrid.nearText(), or generate.nearText(). Indicate which operation is wanted in the query specification. |
hybridNearText | Similar to nearText() above but also sets the operation field to “hybrid”. |
bm25NearText | Similar to nearText() above but also sets the operation field to “bm25”. |
withOption | Pass in the specification field name and the value to set. This clause is removed from the WHERE clause. |
withOptions | Set several specification fields at once. The format is withOptions(key1, value1, keyN, valueN).The key argument may use dot notation when setting the inner value of a component object. |
A more constraining query from our initial one is
select * from Movies where nearText("dracula") where rating_value >= 8
The above references the rating_value property as a virtual SQL column. This results in
{
"filters": {
"operator": "GreaterThanEqual",
"target": {
"property": "rating_value"
},
"value": 8
},
"operation": "nearText",
"nearText": "dracula"
}
Filters can reference Weaviate metadata fields as well using the case-sensitive “metadata_” prefix. An example is based on creation time using the ‘_creationTimeUnix’ field. For details see
https://weaviate.io/developers/weaviate/search/filters#filter-by-metadata
https://weaviate.io/developers/weaviate/api/graphql/additional-operators#metadata-properties
https://weaviate.io/developers/weaviate/search/filters#metadata-filter---by-object-timestamp
Here is a query to return all the properties and the object metadata timestamp values as well.
select *, metadata_creationTimeUnix , metadata_lastUpdateTimeUnix
from question
limit 2
Secondary Options
When using SQL the ‘operation’ field can be set using
withOperation(aString)
in the WHERE clause. For example,
select * from Movies where nearText(“horror”) and withOperation(‘hybrid’)
The “and withOperation(‘hybrid’)” text is effectively removed from the WHERE clause. The above is equivalent to
select * from Movies where hybridNearText(“horror”)
Weaviate provides many retrieval options which were categorized as “secondary options” above. Some of these include groupedTask, simplePrompt, rerank, distance, and queryProperties. To such an argument as part of the Weaviate request use Qarbine’s “withOption(...)” SQL function. The first argument to the function is the case-sensitive name of the secondary option as listed in the table above. The remaining arguments are the value(s) for that option. This can be:
- a single number or string argument,
- multiple arguments which indicate a list option,
- a JSON string of the form “{...}” to indicate a JSON object, or
- a JSON string of the form “[...]” to indicate a JSON array.
For example,
select answer from Question where nearText('Cute animals')
and withOption( "groupedTask", 'What do these animals have in common, if anything?')
limit 4
Below are several notional examples of using the withOption(). Adjust the first argument to the desired secondary option name.
select * from Question where nearText("dracula movies")
and withOption("simpleNumberOption", 123)
and withOption("simpleStringOption", "hello")
// Use the next approach with queryProperties of 1 or more.
// This example has 2 arguments to withOption().
and withOption( "optionWithJsonList", "[ 'question^2', 'answer' ]" )
// Use the next approach with queryProperties of 2 or more.
// This example has 3 arguments to withOption().
and withOption( "optionWithList", 'question^2', 'answer')
// Use the next approach with rerank.
and withOption( "optionWithJsonObject", "{ property: 'question', query: 'publication'}" )
The “withOperation(string)” function offers another way to set the operation field of the query specification. It is a shortcut for “withOption(“operation”, someString)”.
You may also use the plural form, withOptions(key1, value1, keyN, valueN) to set multiple native options.
The key argument may use dot notation when setting the inner value of a component object.
Prompt Considerations
The nearImage value can be obtain through a Qarbine Prompt through a prompt element with the following characteristics,
Reviewing the Generated Specification
You can enter criteria of the form “EXPLAIN SELECT ….” to have the SQL statement processed and have the returned answer set be the underlying query specification. For example enter and run
explain
select * from Movies where _id = "2ea875a4-3317-43af-ae4d-2ea11a673852"
Select the single result element and its details are shown to the right.
Click the “+” to expand all of the JSON object fields.
A convenient way of specifying this is to have “explain” on the first line and the rest of your SQL on the next lines.
explain
select *
from Movies
where _id = "2ea875a4-3317-43af-ae4d-2ea11a673852"
Then simply “comment out” the first line when not in use
// explain
select *
from Movies
where _id = "2ea875a4-3317-43af-ae4d-2ea11a673852"
You can also use “explain: true” in the JSON query specification for similar information.
Another way to get the specification is to press ALT and click . For the query
select question, answer,metadata_score
from Question
where nearText( "food") and category = 'SCIENCE'
limit 2
the result is shown below.
Any “explain SELECT” or “explain: true” takes precedence over the ALT-click interaction.
Mapping Considerations
The Weaviate UUID metadata value can be referenced in queries using “_id”. To retrieve objects sequentially after a known object you can use
WHERE _id > "UUID_Value"
The SQL IN clause maps to ContainsAny. There must be 2 or more elements in the list. For example,
select * from movies where title in ("zhil-byl pyos", "The Great Escape")
The SQL “xx BETWEEN aa AND bb” maps to the equivalent of
xx >= aa and xx <= bb
To use IS NULL filtering requires the target Weaviate class to be configured to index this. For more information see https://weaviate.io/developers/weaviate/api/graphql/filters#by-null-state.
The LIKE operator filters text data based on partial matches. It can be used with the following wildcard characters:
An “?” for exactly one unknown character
“car?” matches cart, care, but not car
An “*” for zero, one or more unknown characters
“car*” matches car, care, carpet, etc
“car” matches car, healthcare, etc.
Column names which are SQL keywords such as where, order, select, or from must be double quoted. For example,
select * from Question where nearText("dracula movie")
// The 'where' column conflicts with SQL WHERE :-(
// and where = 123
// Double quote the column name.
and "where" = 123
Convenience Functions
Weaviate has several special purpose filtering options that Qarbine provides convenience functions to access within SQL statements. This maintains SQL’s style and still enables access to Weaviates many vector database querying features.
Filtering by Geographic Range
To filter by geographic range use
withinGeoRange(property, latitude, longitude, maxDistance)
Per the Weaviate documentation, currently geo-coordinate filtering is limited to the nearest 800 results from the source location, which will be further reduced by any other filter conditions and search parameters. See the following for more details
https://weaviate.io/developers/weaviate/search/filters#by-geo-coordinates.
Property Length
To filter by property length use a clause similar to
len(property) < 23
The given property must be indexed to be filterable! See the following for more details
https://weaviate.io/developers/weaviate/search/filters#metadata-filter---by-object-property-length
Timestamp Value
To filter by a timestamp you can use a clause similar to
property < toTimestamp(timestampString)
The timestampString is converted into a JavaScript Date object using its standard constructor options. That instance’s ISO string value is used as the operand value. The argument may just be a year, month, and day value as well. Here is an example clause using the Weaviate object creation timestamp metadata field.
_creationTimeUnix = toTimestamp("Apr 18 2024")
See the following for more details
The Weaviate documentation states that timestamps must be indexed to be filterable! You can add `IndexTimestamps: true` to the InvertedIndexConfig in the collection. Refer to the Weaviate documentation for details.
Argument List Rollup
Use “toList( a, b, c, …)” to create a list object aggregating all of the arguments. This may be useful for cross reference criteria. For example,
toList("inPublication", "Publication", "name") = "New Yorker"
ends up in GraphQL as
"filters": {
"operator": "Equal",
"target": {
"property": [
"inPublication",
"Publication",
"name",
1,
9.8
]
},
"value": "New Yorker"
},
For more information see
https://weaviate.io/developers/weaviate/api/graphql/filters#by-cross-references.
ContainsAll
The Weaviate ContainsAll operator works on text properties and take an array of values as input. It will match objects where the property contains all of the values in the array. Qarbine has a convenience function to generate the necessary Weaviate syntax. The first argument is the field name and the reset are the list elements. Here is an example
select title, description
from movies
where containsAll(description, "friends", “movie”)
ContainsAny
The Weaviate ContainsAny operator works on text properties and take an array of values as input. It will match objects where the property contains any (i.e. one or more) of the values in the array. Qarbine has a convenience function to generate the necessary Weaviate syntax. The first argument is the field name and the reset are the list elements. Here is an example
select title, description
from movies
where containsAny(description, "France", “monster”)
This is an alternative to using the SQL ‘property IN (listOfValues)’ as described above.
AfterObject
This function formats the Weaviate query specification to return objects after the given UUID.
No other criteria may be used. It sets the “after” field of the query specification. The argument is the UUID value. An example is
select * from question
where afterObject( "218a64c9-04ae-489c-b758-9dc7573e0c7a")
limit 2
Qarbine Runtime Variable References
Your Qarbine Data Sources may use variable placeholders in their definition. For example,
select * from question
where nearText( [! @someUserInputPhrase !] )
limit 2
For more information see https://weaviate.io/developers/weaviate/api/graphql/additional-operators#cursor-with-after.
Query by Example and Report by Example
The Query by Example and Report by Example tools are aware of standard comparison operators.
To obtain the EXPLAIN information in Data Source Designer, QBE or RBE hold down the ALT key when clicking the run button.
References
For more information see https://weaviate.io/developers/weaviate/search/filters.